Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

نویسندگان

Zhao Zhang

Xiaodong Zhang

چکیده

In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations of bit-reversals. We evaluate the merits and limits of each technique and its application and architecture-dependent conditions for developing cache-optimal methods. Besides testing the methods on different uniprocessors, we conducted both simulation and measurements on two commercial symmetric multiprocessors (SMP) to provide architectural insights into the methods and their implementations. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and translation-lookaside buffer (TLB) cache size and which fully use the available registers, are cache-optimal and fast. (2) We show that our padding methods outperform other software-oriented methods, and we believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and multiprocessors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Use of Caching in Decoupled Multiprocessors with Shared Memory

In the following we evaluate the costs and beneets of using a cache memory with a decoupled architecture supporting shared memory in both the uniprocessor and multiprocessor cases. Firstly we identify the performance bottleneck of such architectures, which we deene as Loss of Decoupling costs. We show that in both uniprocessors and multiprocessor machines with high latency such costs can greatl...

متن کامل

RSIM An Execution Driven Simulator for ILP Based Shared Memory Multiprocessors and Uniprocessors

This paper describes RSIM the Rice Simulator for ILP Multiprocessors Version RSIM sim ulates shared memory multiprocessors and unipro cessors built from processors that aggressively ex ploit instruction level parallelism ILP RSIM is execution driven and models state of the art ILP pro cessors an aggressive memory system and a multi processor coherence protocol and interconnect includ ing conten...

متن کامل

False Sharing and Spatial Locality in Multiprocessor Caches

The performance of the data cache in shared-memory multiprocessors has been shown to be diierent from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can signiicantly limit the performance of multiprocessors....

متن کامل

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Reader-writer synchronization relaxes the constraints of mutual exclusion to permit more than one process to inspect a shared object concurrently, as long as none of them changes its value. On uniprocessors, mutual exclusion and readerwriter locks are typically designed to de-schedule blocked processes; however, on shared-memory multiprocessors it is often advantageous to have processes busy wa...

متن کامل

Cache-Affinity Scheduling for Fine Grain Multithreading

Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ‘batching’, for reducing the negative impa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

SIAM J. Scientific Computing

دوره 22 شماره

صفحات -

تاریخ انتشار 2001

Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

نویسندگان

چکیده

منابع مشابه

The Use of Caching in Decoupled Multiprocessors with Shared Memory

RSIM An Execution Driven Simulator for ILP Based Shared Memory Multiprocessors and Uniprocessors

False Sharing and Spatial Locality in Multiprocessor Caches

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Cache-Affinity Scheduling for Fine Grain Multithreading

عنوان ژورنال:

اشتراک گذاری